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ABSTRACT 

The Press-Schechter description of gravitational clustering from an initially Poisson 
distribution is shown to be equivalent to the well studied Galton- Watson branching 
process. This correspondence is used to provide a detailed description of the evolution 
of hierarchical clustering, including a complete description of the merger history tree. 
The relation to branching process epidemic models means that the Press-Schechter 
description can also be understood using the formalism developed in the study of queues. 
The queueing theory formalism, also, is used to provide a complete description of the 
merger history of any given Press-Schechter clump. In particular, an analytic expression 
for the merger history of any given Poisson Press-Schechter clump is obtained. This 
expression allows one to calculate the partition function of merger history trees. It obeys 
an interesting scaling relation; the partition function for a given pair of initial and hnal 
epochs is the same as that for certain other pairs of initial and hnal epochs. 

The distribution function of counts in randomly placed cells, as a function of time, 
is also obtained using the branching process and queueing theory descriptions. Thus, 
the Press-Schechter description of the gravitational evolution of clustering from an 
initially Poisson distribution is now complete. All these interrelations show why the 
Press-Schechter approach works well in a statistical sense, but cannot provide a detailed 
description of the dynamics of the clustering particles themselves. One way to extend 
these results to more general Gaussian initial conditions is discussed. 

Key words: galaxies: clustering - galaxies: evolution - galaxies: formation - cosmol- 
ogy: theory - dark matter. 



1 INTRODUCTION 

This paper is mainly concerned with providing a complete 
description of gravitational clustering from an initially Pois- 
son distribution. This is not because one thinks it likely that 
the initial conditions for gravitational clustering in our uni- 
verse were Poisson. Indeed, measurements of the spectrum of 
temperature fluctuations in the microwave background sug- 
gest otherwise. At present, however, analytic understand- 
ing of the growth of clustering from more general Gaussian 
initial conditions is not as detailed as for the Poisson case 
studied here. Thus, the Poisson model serves as a conve- 
nient toy model with which to study the evolution of non- 
linear clustering. Many of the results obtained in this paper 
should provide at least qualitative insight into the evolution 
of clustering from more general initial conditions. 

The Press-Schechter approach allows one to estimate 
the distribution of the masses of virialized clumps at a given 
epoch directly from the initial density distribution. It is 
based on the hypothesis that, provided the initial velocities 
are sufficiently small (i.e., that the initial field is sufficiently 



cold) , overdense regions in the initial density field will even- 
tually collapse to form nonlinear structures. Thus, by study- 
ing the statistics of overdense regions in the initial field, one 
can infer properties of the nonlinear distribution (Press & 
Schechter 1974; Bond el al. 1991; Lacey & Cole 1993). One 
of the main simplifying assumptions of the Press-Schechter 
approach is that the overdense regions are assumed to col- 
lapse spherically. While this may be a good approximation 
in the mean, JV-body simulations of gravitational clustering 
from initially scale-free Gaussian random fields show that 
the collapse of overdense regions is seldom exactly spherical. 
Nevertheless, the Press-Schechter mass functions for these 
initially scale-free Gaussian fields have been shown to be in 
good agreement with the distribution of clump sizes that are 
measured in relevant iV-body simulations (Efstathiou et al. 
1988; Lacey & Cole 1994). 

The Press-Schechter excursion set description of clus- 
tering from an initially Poisson distribution of identical par- 
ticles has been derived (Epstein 1983; Sheth 1995). The 
probability that a Poisson Press-Schechter clump has N par- 
ticles is 
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n(N,b) = 



(NbY 



(1) 



where N > 1 and < 6 < 1, and b is related to the Press- 
Schechter overdensity threshold S c : 



b =1/(1 + 5c). 



(2) 



For an initially cold Poisson distribution, the threshold over- 
density 8 C decreases as the Universe expands in such a way 
that 6 = initially, and b — > 1 as the clustering develops. If 
clumps collapse spherically in a universe with critical den- 
sity, and both growing and decaying modes are present in the 
linear perturbation theory, then S c = (5/3) 1.69/a, where a 
is the expansion factor (e.g. Bond et al. 1991; Lacey & Cole 
1993). Thus, b changes rapidly at first, but at late times its 
evolution is slowed by the expansion of the Universe. (See 
section 2.3 in Sheth 1995 for another description of the evo- 
lution of b that shows this same behaviour.) Since b is known 
to increase monotonically with time, it will be treated as a 
psuedo-time variable in the remainder of this paper. 

Equation ([!]) is known as a Borel distribution (Borel 
1942). Epstein (1983) derived this distribution by studying 
the properties of level excursions of a Poisson distribution. 
His approach is the discrete analog of that which was used 
later by Bond et al. (1991) in their analysis of the initially 
Gaussian random fields. The Borel distribution can also be 
derived using simple 'cloud-in-cloud' conditional probabili- 
ties for a Poisson distribution (Sheth 1995). This approach 
is the discrete analog of Jedamzik's (1995) treatment of the 
Gaussian case. 

If the probability that a randomly chosen clump con- 
tains N particles is a Borel distribution, then the probabil- 
ity that a randomly chosen particle is in such a clump is 
Nri{N,b)/{N) = (1 - b)Nr](N,b), since the average num- 
ber of particles in a Borel clump is (TV) = 1/(1 — 6). In the 
limit of large TV and small S c , Stirling's approximation for 
the factorial term implies that 



(l-b)Nri(N,b) = 



5 C 



N 



l + <5c 



-N/(1+S c ) 



(N-iy. 



exp 



N S 'r 



(3) 



(Epstein 1983; Sheth 1995). The final expression is precisely 
that which obtains for a Gaussian density field with white 
noise initial fluctuations (e.g. Bond et al. 1991). This shows 
that the Poisson distribution studied in the remainder of 
this paper can be thought of as the discrete analog of the 
white noise Gaussian studied by Bond et al. (1991), and by 
Lacey & Cole (1993, 1994). 

Sections 2 and 3 study other derivations of the Borel 
distribution. These derivations provide new insight into the 
accuracy and applicability of the Press-Schechter approach. 
An analytic expression that is, essentially, the partition func- 
tion that describes all possible merger histories of a given 
Press-Schechter clump is derived in Section 3. Its properties 
are consistent with those inferred previously (Sheth 1995). 
In particular, it is consistent with the physical requirement 
that, in the limit of very small time steps, the probability 
that a clump has two progenitors should be an infinitesi- 
mal of smaller order than the probability that it has three, 
or more, progenitors. Section 3 also shows that the parti- 
tion function, and so the growth of clustering, satisfies an 



interesting scaling relation. In Section 4, this scaling rela- 
tion is exploited to provide insight into the details of the 
growth of hierarchical clustering. Section 4 also shows that 
this Poisson Galton- Watson, Press-Schechter description is 
in qualitative agreement with the results of the numerical al- 
gorithm developed by Kauffmann & White (1993). Section 
5 discusses ways in which the branching process extension 
of the Press-Schechter approach that is developed in this 
paper can be extended to provide a detailed description of 
clustering from initially Gaussian random fields. Appendices 
A, B and C provide details of some of the calculations. 

Appendix D contains a derivation of the distribution 
of counts in randomly placed cells by extending some of the 
ideas developed in this paper. Although all the results of this 
paper are independent of those derived in Appendix D, it 



has been included because the final result in it (equation D3) 
is known to be accurate. Thus, Appendix D provides one 
natural way in which the usual Press-Schechter analysis may 
be extended to provide additional information about the 
evolution of nonlinear gravitational clustering. 



2 A RELATION BETWEEN THE SPREAD OF 
DISEASE AND THE PRESS-SCHECHTER 
APPROACH 

Consider a disease that is spread in accordance with the fol- 
lowing model. Assume that, initially, there is a single carrier 
of the disease. Assume that this carrier is capable of infect- 
ing others, and that the probability that it infects k others 
is given by a Poisson distribution. If the initial carrier is 
thought of as belonging to the zeroth generation, then each 
of these newly infected carriers belongs to the first genera- 
tion. Assume that each of the members of the first generation 
is, in turn, capable of infecting still others, who will make 
up the second generation. The members of the second gen- 
eration infect still others who make up the third generation, 
who infect still others, and so on. Assume that, in any gener- 
ation, the probability that a carrier is able to infect k others 
is given by a Poisson distribution that is specified by the 
parameter, 6, say. It is possible that, by chance, none of the 
carriers in the nth generation infects any new members. In 
this case the number of members in the (n + l)th generation 
is zero. If this should happen, the spread of the disease will 
be said to be halted, and we can ask for the probability that 
there were N people infected in total, including the initial 
member in the zeroth generation. 

This model for the spread of disease has been studied 
in some detail. It is a Galton-Watson branching process in 
which the distribution of the number of progeny of a given 
member of each generation is a Poisson distribution with 
mean 6 < 1 (see, e.g., Harris 1963). This Galton-Watson 
process has an analytic solution. The probability that N 
people were infected in total is given by a Borel distribution 
with parameter 6 (Otter 1949; Good 1960; Consul 1989). 

It is possible to use this Poisson Galton-Watson branch- 
ing process to study the growth of gravitational clustering 
from an initially Poisson distribution. Consider a randomly 
chosen point in a Poisson distribution; this point comprises 
the zeroth generation. All points that are within a given 
'contagious' volume, say v c , around this point will be con- 
sidered to be infected. Since the distribution is Poisson, the 
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Figure 1. A clump with six member particles identified in the 
initial particle distribution using the friends-of-friends percolation 
model. In the percolation model, clumps merge with each other 
as the link length used to define friends-of-friends increases. The 
overlapping of volumes makes the percolation model difficult to 
treat analytically. 
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Figure 2. A branching process model of hierarchical clustering. 
The number of particles in each oval is a random variable deter- 
mined by the (Poisson) initial conditions. The two ovals for each 
particle represent two different epochs, corresponding to two dif- 
ferent link lengths, or two different over-density thresholds, with 
oval size increasing (over-density threshold decreasing) with time. 
In this example, the clump of six particles was composed of four 
single particles and one pair at the earlier epoch. The fact that 
different volumes are assumed to not overlap makes this branch- 
ing process analytically tractable. 



probability that this point is able to infect k others is Pois- 
son, and the parameter that specifies this Poisson distribu- 
tion is related to the distance from the carrier out to which 
the disease is contagious. For convenience, assume that v c 
is defined so that the parameter of the Poisson distribu- 
tion is b = nv c where n is the the average density. These k 
points make up the members of the first generation. How- 
ever, each member of the first generation will also have been 
able to 'infect', say, j others, corresponding to the j neigh- 
bours that could have been (Poisson distributed) within v c 
from it. The set of all these neighbours of all the k mem- 
bers of the first generation makes up the second generation, 
and so on. It is clear that this situation is similar to the 
one described in the previous paragraph. This means that 
this Galton-Watson model for clustering from an initially 
Poisson distribution implies that the distribution of nonlin- 
ear clump sizes is a Borel distribution. In other words, this 
Galton-Watson model provides the same description of non- 
linear clustering as the better known Press-Schechter type 
analyses described earlier. 

For this description to be exactly like the Galton- 
Watson process described above, we must assume that the 
probability that one of the members in the first generation 
has j neighbours within v c from it is independent of the fact 
that it is one of k particles within v c from the initial parti- 
cle in the zeroth generation. This is the same as assuming 
that the volume v c centered on the initial particle does not 
even partially overlap the volume v c centered on each of the 
k members in the first generation. Clearly, this assumption 
is false; the assumption that the volumes never overlap is a 
gross simplification. Nevertheless, we will continue consid- 
ering this simplified model, since it provides the same mass 
function as the Press-Schechter Borel distribution. 

One can argue that the simplification which enabled us 
to pose the problem in terms of this Galton-Watson model 
suggests one way in which the Press-Schechter mass func- 
tions could be modified. Accounting for the fact that it is 
possible for the volume v c centered on the initial particle 
to partially overlap the volume v c centered on one of the k 
members in the first generation, and so on, is the problem 
known as friends-of-friends percolation. Clearly, percolation 



is distinct from the Galton-Watson process described above. 
The percolation model as formulated here can be developed 
as an alternative model for the growth of clustering. In prin- 
ciple, friends-of-friends clump mass functions can be calcu- 
lated from the initial distribution; they are functions of the 
percolation link length (which is simply related to the size 
of the 'contagious' volume v c ), and are different from the 
clump mass functions determined using the Galton-Watson 
model. Moreover, mergers are also well-defined in the per- 
colation model, as is the partition function of merger trees. 
Thus, the percolation model is at least as well defined as the 
Press-Schechter excursion set model. However, at present, 
neither the distribution of friends-of-friends clump sizes, 
nor the associated merger probabilities can be calculated 
analytically. Since the Galton-Watson model provides the 
same distribution of clump masses as do the excursion set, 
or the cloud-in-cloud, analyses of the Press-Schechter ap- 
proach, the percolation model will not be considered further 
in this paper. 



3 THE GALTON-WATSON DESCRIPTION OF 
HIERARCHICAL CLUSTERING 

The excursion set formulation of the Press-Schechter theory 
shows clearly how to formulate and solve for a description 
of merging and hierarchical clustering (Bond et al. 1991; 
Lacey & Cole 1993). Essentially, the merging problem re- 
duces to solving a two-barrier problem that is analogous to 
the one-barrier level crossing problem that was solved to 
obtain the Press-Schechter multiplicity function. For an ini- 
tially Poisson distribution the two-barrier problem has also 
been solved. The probability that, at the epoch bi, one of 
the progenitors of a clump with k particles at the epoch 
&2 > bi, was of size j < k, is easily related to the conditional 
distribution 

'«■•*-> - c)fi(r'*m 
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where 61 < 62 and k > j (equation 40 in Sheth 1995). Here, 
f(j, bi\k, 62) is the probability that a particle which is in a 
fc-particle clump at the epoch 62 was in a j-particle clump at 
the epoch bi. The statements f(J, bi\k, 62) are independent 
of what happened at times previous to 61 and also of what 
will happen at times later than 62- Appendix B shows that, 
in the appropriate limits (i.e., fc > 1, j > 1, and fc - j > 
1, and also 61 — »■ 1 and b2 — > 1), equation (Q) reduces to 
equation (2.15) of Lacey & Cole (1993) (also see Fig. 2 and 
associated algebra in Sheth 1995). 

Following Lacey & Cole (1993), equation (Q) can be ma- 
nipulated to provide an expression for the probability that 
a clump with k particles at the epoch 62 was formed from 
other subclumps that merged with a subclump that had ex- 
actly j particles at the epoch 61. However, it does not pro- 
vide any information about the distribution of sizes of the 
other subclumps, other than the restriction that the total 
number of particles in those other subclumps must sum to 
k —j. Another way of thinking of f(j, 61 \k, b 2 ) is to note that 
it is obtained without consideration of the different ways 
in which the object of size j (at the epoch 61) could have 
merged with other objects to form the final object of size k 
(at the epoch 62). If the merging process is visualized as a 
tree (so trees having different numbers of branches, branch 
sizes and branching points describe different merger histo- 
ries), a useful quantity is the probability that a particular 
tree structure, rather than any other, occurs. So, the prob- 
lem is to solve for what is, essentially, the partition function 
for various tree structures. 

Recall that the statements f(j, bi\k, 62) are independent 
of what happened at times previous to &i and also of what 
will happen at times later than b 2 . So, to solve for the entire 
merger history tree (i.e., for many fei < 62 < ■ ■ ■) one need 
only know how to solve for the tree structure at any given 
two epochs, say, bi and b 2 - Therefore, in what follows the 
later epoch, 62, is referred to as the final epoch. Notice also 
that the statements f(j, bi\k, 62) imply that it is possible 
to calculate the partition function for a tree with k particles 
without explicitly considering the partition function for trees 
with / 7^ k particles. So, for any given pair of epochs 61 and 
62, and for any tree whose final size (i.e., at the epoch 62) is 
k, the problem is to calculate all possible tree configurations, 
and to assign to each configuration the probability that it 
actually occurs. The elements of the set of all possible fc- 
particle tree configurations are simply the various ways of 
partitioning the integer k (physically, this is what is required 
by mass conservation). 

Let p(l ni 2" 2 ••• k" k \k), with ni H + n k = m and 



k, denote the probability that the final clump 



with k particles at 62 had m progenitors at b\ , of which n\ 
were singles, 712 were pairs, rij were subclumps with j parti- 
cles each, and so on. Note that p(l ni 2" 2 • • • k nk \k) is inde- 
pendent of the order of {l ni 2™ 2 • ■ • k nk } since different per- 
mutations of the 'branches' should all have the same prob- 
ability of occurring. The problem is to calculate this prob- 
ability for given k,m,b 2 ,bi and {ni, n.2, • • • , nk}- To date, 
it has not proven possible to calculate this probability ex- 
actly from the excursion set description itself (Kauffmann 
& White 1993; Sheth 1995). In this section we will use the 
Galton-Watson interpretation of the clustering process to 
obtain probabilities for the various partitions of k. 

The Galton- Watson process can be used to formulate a 



description of the merger history of a clump as follows. As 
before, consider the single member of the zeroth generation. 
This person has a number of children, and they each have 
some number of children, and so on down the family tree. If 
the probability a given member of the tree has n children is 
given by a Poisson distribution with parameter 62, then we 
have the same branching process as before. The probability 
that such a family tree has N members in total, after which 
the family died out, is given by a Borel distribution with 
parameter b 2 . Now consider the more complicated case in 
which some of the children are male and some female. As- 
sume that the probability that a given member on the tree 
has m daughters is a Poisson distribution with parameter 
61 and the probability of having n% sons is a Poisson distri- 
bution with parameter b 2 — b\. This is the same as assuming 
that the probability that a given member on the tree has 
n children is given by a Poisson process with parameter 62, 
and the probability that m are daughters and n 2 = n — n\ 
are sons is given by a Binomial distribution where the prob- 
ability that any given child is female is p — b± /b 2 ■ Consider 
the family tree of such a process, and assume that the initial 
ancestor was male. For such a family tree, we can ask for the 
probability that there were Ni females and N2 — N — Ni 
males in total in the family tree. This branching process is 
a special case of that studied by Good (1960). 

Now imagine drawing the family tree. Use dots to rep- 
resent children of either sex, but only draw lines between 
parents and their daughters. Then the tree will consist of 
a number of groups, in which each member of the group is 
connected to other members of the same group, but not to 
members from different groups. Call these groups subfam- 
ilies. (The tree is inherently sexist, as the first member at 
the head of every new subfamily is always male.) 

We will characterize trees by the distribution of the sizes 
of the subfamilies in them. Let rii denote the number of 
subfamilies each with i members. Then we can ask for the 
probability p(l ni • ■ ■ k nk \k) that a tree with k members has 
exactly m subfamilies (so m + • • - + Uk = m), and that there 
were ri\ singles, 712 pairs, rij j'-tuples, and so on. Clearly, this 
characterization of a family tree (by the size and number of 
subfamilies in it) is exactly analagous to the distribution of 
subclump sizes for a given Press-Schechter clump. It pro- 
vides us with probabilities for the various partitions of k. 
In particular, this extension of the Poisson Galton-Watson 
branching process requires that 



p(l ni 2 n2 •■• k n "\k) 



??(iA) ni 77(2Ar 2 --- ■n{k,b 1 ) n " 
n{k,b 2 ) 



m\n 2 \ • • • n k 
1 [fc(&2-bi)r~ 



-1 _-*(b 3 — 61) 



-(5) 



m (m — 1)! 

where rj(l, b) is the Borel distribution with parameter b 
(eq. ^), and rij denotes the number of subfamilies in the tree 
that have exactly j members (with the usual convention that 
0! = 1). Since this branching process is related to the grav- 
itational clustering process, the probability that a clump of 
size k at the epoch 62 had the m progenitors m, TI2, ■ ■ ■ , rik 
at the epoch b\ is given by equation (||) . Thus, equation Q 
can be used to generate the partition function of merger 
history trees. It is the main result of this paper. 
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Appendix B shows that this partition function is con- 
sistent with the merger probabilities of equation (Q) . It also 
provides additional insight into the origin of the various 
terms in equation (g). Appendix C describes a queueing pro- 
cess that provides another way to derive this expression. 



3.1 Some properties of the partition structure 

Some limits of equation (|s|) are worth studying. As b 2 — * 
fei, p(l ni ■■■ k" k \k) — > 0, except when m = 1, for which 
^(fc 1 !^) — > 1 as required. As 61 — ► 0, then rj(l, 61) — > except 
for I = 1, so that 



lim p(l" 
61 ^0 



k nk \k) 



p(l k \k) 



{kb 2 ) h 



k\ 



rj(k,b 2 



1. (6) 



In other words, in the limit as 61 — > 0, all progenitors are 
certainly single particles, which is also the expected result. 

The probability n(m\k) that a clump of size k at the 
epoch b 2 has m progenitors at the epoch bi is given by sum- 
ming p(l ni • • • k nk \k) over all distinct sets of m integers that 
add up to k. Now, any set specified by {m, • • • ,n k } with 
ni + ■ ■ ■ + nk = m can be written explicitly in terms of its m 
members as {h, ■ ■ ■ l m }. Thus, h + - ■ ■ +l m = 53 —1 i n i = ^- 
So, 



n(m\k) 



E 

m parts. 



ij(1,6i) B1 --- r?(Mi) nfc ml 



V(k,b 2 ) 
1 [k(b 2 - h)]™- 1 e - k ^- b ^ 



E 

m parts. 



m 

1 1-1 \ ni 



(m- 1)! 

uk-l \ n i* 



k\ 



rlnjH 1-*"*,- 



g lni &iH + kn k b\ 

k\ 



(fcfc 2 ) fe - 1 

1 [fc(6a 



mi 



bi) 



k\ 



ml k k ~ m 



(m 
In 



2i\ d-h. 

b 2 J \ b 2 



E 



fc l l 2 



1 



ml k k ~ m 

k- 1 
m — 1 



dist .perms 



(I) 

61 
b 2 



-(m-l) 



ft {k- 

m — l 



(7) 



The sum in the third equality is over all sets of m integers 
which satisfy h + • • • + Z m = k, and over all the distinct 
permutations of each set (which accounts for the multino- 



mial factor m!/ni! 



The second from last equality 



follows from a combinatorial identity (note the similarity to 
the Borel-Tanner distribution of equation B3; also see Sheth 



& Saslaw 1994). The final expression for n(m\k) is the same 
as that obtained previously (eq. 52 in Sheth 1995). 

It is easy to see that this expression is sensible. Referring 
back to the Galton-Watson family tree description, recall 



that the head of the tree is always the initial male, and other 
males start subfamilies within the tree. So, the number, m, 
of subfamilies within a family tree having k members in total 
is equal to the number of male members in the tree. Now, the 
probability any given child is male is q = 1— p — (b 2 — b\)/b 2 . 
Since it is certain that one member of the tree is a male (the 
head of the family tree is always a male), the probability 
that there are m males in a family with k members should 
be [(k- l)!/(fc- m)!(m- l)!]_p fe ~ m q" 1 ' 1 . This is identical to 
the Binomial distribution of equation (Q). Alternatively, this 
expression for the number of subclumps of a fe-sized clump, 
which is the same as the number of sons in a k sized family, 
can be obtained directly from Good's (1960) treatment of 
the Galton-Watson process with many types of progeny. The 
brute force calculation is given in Appendix A. It, too, shows 
that n(m\k) is given by equation (|7|). 

Equation (Q) is also consistent with the physical re- 
quirement that, in the limit of very small time steps, the 
probability that a clump has two progenitors should be an 
infinitesimal, the probability that the clump has three pro- 
genitors should be an infinitesimal of the next higher order, 
and so on. It also implies that massive clumps form pref- 
erentially from massive progenitor clumps: on average, the 
size, at some earlier epoch, of the largest progenitor clump of 
a clump at some given final epoch depends significantly on 
how massive the final clump is relative to the characteristic 
mass at the final epoch (eq. 55 in Sheth 1995). 

In the limit of large k and large subclumps U, equa- 
tion (g) has an interesting interpretation. Stirling's formula 
for the factorials implies that 



3 I 



'1 



/' 

' 11 



kl 



hi-- - l m \ fcfe- 1 m!--- n fe ! 

^ k — m / f, \ m — l 

X ' 



61 

ml ■ ■ ■ n k \ \b 2 



1 — m 
V27rfc2 -p-i- 1 

~(2^yf Hp7a 

v ' i=l s 

'61 



mi • • ■ n k l 



(£)(-£) 

m 

2 1 

7r fc)(m-l)/2 Il~375' 



(2nk) 

where we have defined Xi = h/k. The first term on the right 
is the combinatorial term that is included to insure that each 
distinct combination of subclumps is counted only once. The 
second set of terms accounts for the probability that there 
are exactly m subclumps. The final product term is the most 
interesting. It shows that, for a given value of the final clump 
size k, and if m <H k, the case in which the subclumps are all 
approximately the same size is much less likely than the case 
in which one of the subclumps is very much more massive 
than all the others. 

The partition function (eq. ^) exhibits an interesting 
scaling. It depends on the initial and final epochs, bi and 
62, only through the combination p — 61/62. As one would 
expect, this is also true of the simpler statement given in 
equation (^). This means that the probability p(h • • • , lm\k), 
given the two epochs 61 and 62, will be the same as the 
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probability p'(h ■ ■ ■ , lm\k), given b[ and 621 provided that 
61/62 = b'i/b'2 — p. In this sense, the clustering evolves in 
a self-similar fashion. This scaling can be exploited when 
comparing equation (JE]) with merger histories of clumps in 
iV-body simulations. When 61 — > 1 and 62 — ► 1, the ratio 
61/62 = (l + $ca)/(l + £ci) -> l + 5c2-5ci- Thus, the scaling 
in 61/62 corresponds to the scaling in (5 c i — 8 C 2) discussed, 
for example, by Bond et al. (1991) and by Lacey & Cole 
(1993). 



4 APPLICATIONS 

One example of the type of statistical question that we can 
now answer is as follows. Suppose we are interested in the 
distribution of sizes of the largest progenitor clump at some 
epoch 61, of a given clump at the epoch 62. This might be 
of interest, for example, in studies of the Butcher-Oemler 
effect (Bower 1991; Kauffmann 1995). It is straightforward 
to compute the relevant sums over the partition function 
numerically. As an example, Fig. |^ shows the probability 
that the largest subclump of a clump with k particles has 
h particles, for two choices of fc, and for two choices of the 
ratio p = 61 /&2 of the initial and final epochs. 

Three features of Fig. ^| are obvious. First, for given k, 
the curves depend strongly on p, the ratio of the two epochs 
61 and 62. As p increases, the curves peak at higher values 
of 1%/k. This means that the largest progenitor of a given 
clump is a smaller fraction of the total mass as the time 
between the final epoch and the epoch at which the pro- 
genitors were identified increases. This simply reflects the 
fact that the clustering is hierarchical; small clumps merge 
to form big clumps, and on average, clumps were smaller 
in the more distant past than they are at present. Second, 
the curves depend strongly on k, the number of particles in 
the final clump. For a given value of p, curves with higher 
values of k peak further towards the left. That is, for a given 
value of p, the largest progenitor of a massive clump is more 
likely to be a smaller fraction of the total mass than is the 
largest progenitor of a less massive clump. In this sense, 
for any choice of initial and final epochs (because the parti- 
tion function only depends on the ratio 61/62), more massive 
clumps always appear to have assembled a larger fraction of 
their mass more recently than less massive clumps. Lacey 
& Cole (1993) show that this is also what happens in the 
Gaussian case. Furthermore, it is consistent with the analyt- 
ical result that, for a clump that has k particles in total, the 
average number of particles in a progenitor subclump (not 
necessarily the largest subclump) is k/[p + k(l — p)] (Sheth 
1995). 

So, we expect the average size, and the most probable 
size, of the largest progenitor clump to decrease as p de- 
creases. However, based on Fig. [| the scaling property of 
the partition function, and the fact that 6 evolves quickly 
initially and slower at later times (see discussion following 
cq. ^), we can also study the rate with which this size de- 
creases. As a specific example, consider two clumps, each of 
size of k. Assume that one of the clumps is completely assem- 
bled at epoch 62 and the other at b' 2 < 62 (the primed clump 
was assembled earlier than the unprimed clump). Now con- 
sider the progenitors of the unprimed clump that are iden- 
tified at the epoch 61 = pb 2 . The partition function (eq. |s|) 
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Figure 3. Probability, f(h\k), that the largest subclump of a fc 
particle clump has l\ particles, for two choices of the ratio p = 
61/62 of the initial and final epochs, and for two choices of fc. 
Solid lines show the distribution for fc = 10 with p = 0.3 and 
0.9. Dashed lines show the distribution for fc = 50, with the same 
values of p. 



specifies this distribution of progenitor subclumps. So, we 
can compute, e.g., the average size of the largest progeni- 
tor subclump. Now consider the primed clump. The scaling 
property of the partition function shows that its subclump 
distribution will be the same as that of the unprimed clump 
at the epoch when &'i = pb' 2 < 61 , where p = 61 jb% has the 
same value as for the unprimed clump. Consider the aver- 
age size of the largest progenitor of these two clumps as a 
function of 'lookback time' from the epochs 62 and b' 2 . 

Since p is the same for the two clumps, they will have 
the same average size for the largest subclump at the epochs 

61 and 6'i < 61 . However, the lookback time for the unprimed 
and the primed clumps is the time corresponding to 621 = 

62 — 61 and b'21 = b' 2 — 6'i , respectively. Since 6 changes ever 
more slowly as it increases, b' 21 < 621. This implies that the 
average size of the largest progenitor of the primed clump 
(which was assembled earlier) decreases more rapidly than 
it does for the unprimed clump (which was assembled later) . 
In other words, when phrased in terms of lookback time, the 
evolutionary history of a clump of size k depends on when 
it was first assembled. 

As discussed above, the evolutionary histories of 
clumps, when phrased in terms of lookback time, depend on 
the rate of change of 6. In the previous paragraph we were 
able to draw conclusions about the dependence of evolution 
on formation epoch because the rate of change of 6 is a func- 
tion of epoch. However, as noted in the introduction, the rate 
of change of 6 also depends on the background cosmology. 
Thus, one also expects the evolutionary histories of clumps 
to be sensitive to the background cosmology. For example, 
the b(t) curves for different cosmologies show that a clump 
of a given mass will have formed at a greater lookback time 
in a low density universe than in one with critical density. 
These trends, the more recent assembly of larger relative 
to smaller clumps, and the sensitivity of merger histories to 
the background cosmology, have been noted by Lacey & Cole 
(1993) and by Kauffmann (1995) in their study of clustering 
from initially Gaussian fields. The implication that similar 
mass clumps which were assembled at different times have 
different lookback time histories is in qualitative agreement 



with the numerical, Monte-Carlo model used by Kauffmann 
(1995). 

The third feature that is evident in Fig. H is simply that 
the curves are extremely skew. This means that the average 
number of particles in the largest progenitor subclump is not 
necessarily a good indicator of the most probable number 
of particles in the largest progenitor. Therefore, the curves 
of average merger histories given in Fig. 1 of Kauffmann 
(1995) should be treated carefully. Note that equation (js|) 
provides an efficient way of evaluating the difference between 
the mean and the most probable sizes. 

In addition to allowing one to calculate the dispersion 
around the mean history of any given clump, the partition 
function can also be used to compare this Galton-Watson 
Poisson Press-Schechter model with iV-body simulations. It 
is also useful to compare the merger histories described by 
equation Q with the ad hoc, Monte-Carlo merger histories 
generated by Kauffmann & White (1993). This will provide 
a test of the Galton-Watson model developed here, and may 
also provide some insight into the reason for the accuracy of 
the Kauffmann- White algorithm. 

To effect this comparison, we will plot the ratio of the 
largest progenitor clump to the total mass h/k, versus the 
ratio of the second largest progenitor to the largest progen- 
itor, h/h. Figures 2 and 3 in Kauffmann & White (1993) 
show examples of such plots. Fig. § (of this paper) shows 
what is essentially the joint probability that the largest two 
progenitor subclumps are h and h, given that the final 
clump is of size k, as determined by the partition function 
(eq. |5J) for p = &i/b2 = 0.8 (top panel) and 0.3 (bottom 
panel), and for k = 10 (solid contours) and k — 50 (dot- 
ted contours). Large values of p correspond to large time 
differences between the initial and final epochs. The results 
are plotted in terms of the ratio h/k versus h/h, for ease 
of comparion with the work of Kauffmann & White (1993). 
The contours are at 0.0001, 0.0003, 0.001, 0.003, 0.01, 0.03, 
and 0.1. The figure shows that, for a given value of k, most 
of the clumps lie along a relatively narrow band in the 
(h/k), (h/h) plane. The location of the band depends on 
the values of k and p. 

Fig. ^ is qualitatively similar to Figs. 2 and 3 in Kauff- 
mann & White (1993). At small lookback times (large p, 
top panel) the band lies at somewhat larger values of h/k 
than at larger lookback times (small p, bottom panel). This 
simply reflects the fact that, at smaller lookback times, a 
larger fraction of the clump survives relatively intact. How- 
ever, the band for the clump with larger k is at a lower 
value of h/k than the less massive clump. This is consistent 
with the faster assembly required by larger clumps that we 
deduced in Fig. |3|. The top panel also shows that, in this 
(relatively small lookback time) regime, the joint probabil- 
ity distribution has one peak at large values of h/k (and 
correspondingly small values of h/h), and a broader, not so 
high peak around the values of h ~ k/2 and h ~ ii/2. Thus, 
at small lookback times, it appears that most clumps grow 
because a large clump accretes many smaller ones. Those 
that do not, grow because of the mergers of objects that are 
approximately the same size. 

At larger lookback times (bottom panel), the band is 
shifted towards lower values of h/k. Furthermore, the joint 
probability distribution becomes peaked towards the lower 
right hand corner of the plot, at values of h/h ~ 1. This, 
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Figure 4. Joint probability that the largest two progenitor sub- 
clumps arc ii and h, given that the final clump is of size k as 
determined by the partition function (eq. 5) for p = 61/62 = 0.8 
(top panel) and p = 0.3 (bottom panel), when k = 10 (solid 
contours) and k = 50 (dotted contours). The results are plotted 
in terms of the ratio h/k versus h/h, for ease of comparison 
with the earlier work described in the text. The contours are at 
probability levels of 0.0001, 0.0003, 0.001, 0.003, 0.01, 0.03, and 
0.1. 

too, is sensible, since at large lookback times, most clumps 
have split up into a large number of small, approximately 
equally massive progenitors. These features are in qualita- 
tive agreement with those measured in iV-body simulations 
(Kauffmann & White 1993, and references therein). 



5 DISCUSSION 

The partition function that describes the relative prob- 
abilities of all possible merger histories of those Press- 
Schechter clumps which form as an initially Poisson dis- 
tribution evolves gravitationally, can be written in closed 
form (eq. ^). The partition function is a function of the ini- 
tial and final epochs (denoted by 61 and &2, respectively). 
Since the time evolution of b can be computed (cf. the Intro- 
duction) the temporal evolution of the partition function is 
known. The counts in cells distribution associated with this 
Poisson Press-Schechter distribution can also be computed 
by extending the branching process analogy (Appendix D). 
It, too, is a function of b, so its temporal evolution is also 
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known. Thus, the Press-Schechter description of clustering 
from an initially Poisson distribution is now complete. 

It may be worth pointing out that Appendix B shows 
that the Poisson Galton- Watson branching process speci- 
fies the Borel clump size distribution (equation |l|) and the 
merger probabilities (equation ^) uniquely, and also allows 
one to describe the merger tree completely (equation . On 
the other hand, if equations (jl|) and (^) are treated as the 
only known constraints on the form of the merger tree, then 
the branching process considered in this paper is not the 
only way to construct the merger history trees consistent 
with these constraints (e.g., Sheth 1995). For instance, one 
can always construct an ad hoc Monte-Carlo scheme, like 
that of Kauffmann & White (1993), which satisfies equa- 
tions (Q) and (^J), and is not necessarily consistent with the 
branching process description of equation For this rea- 
son we have argued that, rather than being completely ad 
hoc, the branching process is, indeed, a reasonable model 
for the growth of clustering (Figs. |l| and |^ and associated 
discussion) . 

A Poisson distribution is similar to a white noise Gaus- 
sian random field. So, this result can be related to the evo- 
lution of clustering from an initially Gaussian density field 
that has a scale free power spectrum with slope n = 0. Can 
it be extended to describe the growth of gravitational clus- 
tering from Gaussian random fields with arbitrary initial 
power spectra? 

One way in which to do this is as follows. The Borel dis- 
tribution reduces to the Press-Schechter multiplicity func- 
tion in the limit as Af > 1 and as b — » 1. Similarly, f(j\k) 
(eq. [|) reduces to the n = Gaussian result provided that 
fc > I, j > 1, k—j > 1, and 6i and 6 2 — ► 1 (Sheth 1995). For 
a given density threshold <5 C [eq. |^ shows that b = 1/(1 + 5 C )] , 
Gaussian fields with different power spectra yield different 
excursion set Press-Schechter mass multiplicity functions. 
However, these differences arise solely because the relation 
between the variance and the mass depends on the power 
spectrum. When written directly in terms of the variance, 
rather than the mass, the Press-Schechter multiplicity func- 
tions and the merger probabilities have a universal form that 
is independent of the underlying power spectrum (e.g. Lacey 
& Cole 1993) . This suggests writing the partition function of 
equation Q in terms of the variance of the Poisson (n = 0) 
distribution. Then, in the large h and 6 — ► 1 limits, equa- 
tion (|B|) should reduce to the Gaussian result. The appro- 
priate Jacobian can then transform the partition function 
(written in terms of the variance) into an expression for the 
masses of the subclumps. 

There may be a more sophisticated way to describe the 
merger histories of clumps that form from Gaussian random 
fields. In this paper, branching processes and queueing the- 
ory were both used to formulate and solve a problem that 
was originally posed in terms of trees associated with ran- 
dom walk excursions on a discrete Poisson distribution. So 
the question is, Is it possible to extend any of these relations 
to the continuous Gaussian case? For example, is it possible 
to formulate the excursion sets of random walks associated 
with Gaussian fields, described by Bond et al. (1991) in their 
derivation of the Press-Schechter mass functions, in terms 
of branching processes? Trees associated with excursions of 
random walks on Gaussian fields are the subject of current 
interest (Neveu & Pitman 1989; Le Gall 1993). The 6 -»• 1 



limit is known as the 'critical' Galton-Watson process (the 
b < 1 case considered in this paper is 'subcritical') . Many 
properties of the tree associated with this critical process 
have been calculated (Aldous 1993 and references therein). 
The application of these ideas to the Press-Schechter de- 
scription of clustering from arbitrary Gaussian initial con- 
ditions is in progress. 

One final way in which to extend the ideas of this pa- 
per to arbitrary initial conditions is to note that the usual 
Press-Schechter mass functions provide a good, but by no 
means perfect, description of JV-body simulations of clus- 
tering. Thus, one might reconsider the percolation model 
discussed in Section 2. First, one would compare the perco- 
lation mass functions (obtained numerically) for the Poisson 
case with those of the branching process (the Borel distribu- 
tion, equation Q). If the percolation mass functions provide 
an acceptable fit to the simulation results, then one could 
also compute (numerically) and test the percolation merger 
probabilities and merger tree. Of course, it is trivial to apply 
the percolation model to arbitrary initial conditions. 

The discussion at the end of Section 2 shows clearly that 
this branching process extension of the Press-Schechter ap- 
proach is correct only in a strictly statistical sense. This is 
because, unlike in the percolation model discussed in Sec- 
tion 2, there is no direct correspondence between particles 
in the initial distribution and particles in clumps. That is, 
the Galton-Watson branching process, like the excursion 
set interpretation of the Press-Schechter mass function, is 
a purely statistical model for the growth of clustering. To 
be useful, it relies on the accuracy and applicability of the 
ergodic hypothesis. It cannot, and should not, be expected 
to work on a particle-by-particle basis. 
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APPENDIX A: THE POISSON 
GALTON-WATSON PROCESS WITH TWO 
TYPES OF PROGENY 

The Poisson Galton-Watson process with m different types 
of progeny has been studied by Good (1960). He shows that 
if the branching process starts with r\ individuals of type i, 
for all 1 < i < to, then the probability that the whole tree 
contains exactly ni of type 1, n 2 of type 2, and so on, is 
equal to the coefficient of 

n l — r l ^2 — r 2 n m— r m 



f" 1 . . . f n " 
J 1 J m 



where 8" is the identity matrix and 



/m( z ) = JJexp(-a M „(l - z 



reflects the fact that the probability that an individual of 
type fi has a child of type v is a Poisson distribution with 
parameter a M „ independently of the other individuals. 

For our problem, m = 2, and subscript 1 is for girls 
and subscript 2 is for boys. Then an = <J2i = 6i and an = 
a.22 = 62 — 61, and the problem is to extract the relevant 
coefficients of 



exp 



(ni [61(21 -1) + (62 -6i)(z 2 -l)] 



+ n,2 



[6i(*i -1) + (62 -6i)(^-l)]) 



22(62 - 61) 



-226121(62-61) \. (Al) 



Suppose that we are interested in the family tree that was 
started by one male ancestor, in which there are N mem- 
bers of the tree in total. Then we can use equation (Al) 



to calculate the probability of having a tree with exactly N 
members. Let n denote the number of females in the tree. 
Then set m = n and n% — N — n and let n = and T2 = 1. 
The answer is obtained by summing up all the coefficients of 



terms of order 2™ 



This involves straightforward but 



that the probability of having N members in the tree is 
P(N) = r,(N,b 2 ) 



i:(V)®>ir~ 



(A2) 



where n denotes the number of girls in the tree. Clearly, this 
is the same result as that given by equation ((?]) in the main 
text. 



APPENDIX B: MERGER PROBABILITIES 
AND THE PARTITION FUNCTION 

Some of the results of this Appendix were suggested by 
the recent work of Pitman (in preparation) . This Appendix 
shows explicitly that the partition function derived in the 
main text, equation (^) and the merger probabilities of equa- 
tion (^) are consistent with each other. Demonstration of 
this consistency is useful because, in the appropriate limit, 
the merger rates implied by equation (^) are in good agree- 
ment with A^-body simulations of gravitational clustering 
(Lacey & Cole 1994). For completeness, this limit is derived 
below. In addition, an expression for the mean number of 
j-sized subclumps of fc-sized clumps is derived directly from 
the partition function. The argument is generalized, at the 
end of this Appendix, to obtain an expression for the facto- 
rial moments of this distribution. 

Lacey & Cole (1993) define a merger rate by taking the 
limit as Si — * 82 in f(k, 82] j, Si). That is, the merger rate is 



dP(J -> k\8) 
dS 



dS = 



lim 

*1— **2 



= lim 



(l-6 2 )fcr ? (fc,6 2 ) f(j,Si\k,S 2 ) 
(1 - bi)jr)(j,bi) 
k 82 (81 — 5 2 ) 



Sl^S 2 {k-j)\ 81 (I + S2) 2 

k N k ~ j - 1 



J 



1+02 I+81 



X e 1 + 6 2 ! + ■>! 



d<5 



a s(k-j)/a+s) 



^(k-j)3/2 (1+8) (1+S) k -1 

dS ( k 2 



2tt \(k-j) 



3/2 e _ S 2 (k _ j)/2 



k 2 (1 + 8) 



(Bl) 



(Sheth 1995). The third expression on the right follows from 
setting Si = S 2 = 8 and 81 — Sd — dS, and considering the 
limit when k ^> 1, j> 1 and k — j 3> 1. Stirling's approx- 
imation for the factorials simplifies the expression consider- 
ably, and the final expression follows from assuming i5 < 1. 
Except for the (1 + 8) term in the denominator, the final 
expression is the same as the expression derived by Lacey 
& Cole (1993). In their notation 5i oc S2 oc 1/fc, and 
to = 8, since the relevant Gaussian corresponding to the 
Poisson is the white noise case. So, in the limit where Stir- 
ling's approximation for the factorials is valid, and when 
8 <g; 1, equation (Bl) here reduces to their equation (2.17). 
In this limit, the merger rate implied by equation (0) de- 
scribes the simulations well (Lacey & Cole 1994). 

One other limit of equation (0) is also interesting. When 



tedious algebra which is not reproduced here. The result is k ^> j, Stirling's approximation for k\ and (k — j)\ in equa- 
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tion (^) implies that 



, J- 1 



-j(bl/b 2 ) 



(B2) 



This shows that the probability that a randomly chosen 
member of a Borel clump that has exactly k members at 
the epoch 62 was in a Borel clump with j <C k particles at 
the epoch bi is, to an excellent approximation, given by a 
Borel distribution with parameter 61/62- When 62 — » 1, this 
means that the probability that a particle is in a clump with 
j — 1 other particles is given by a Borel distribution with pa- 
rameter 61 . Since the limit 62 — > 1 is equivalent to requiring 
82 —> 0, this is exactly what is required by the derivation 
of f(j\k) from the two barrier (excursion set) problem con- 
sidered in Sheth (1995). Another application of Stirling's 
approximation (to the j\ term), with the limits 61 — > 1 and 
b-2 — > 1 shows that equation (M) is similar to the Lacey & 
Cole (1993) expression for the white noise Gaussian case. 
Fig. 2 in Sheth (1995) also shows this to be true. 

Before deriving the merger probabilities f(j\k) of equa- 
tion (^) directly from the partition function (equation |E|), 
it is useful to consider some combinatorial identities. First, 
consider the random variable X, and assume that the dis- 
tribution of X is Borel with parameter b. Then the distribu- 
tion of S m = Xi + • • • + X m , where the Xi are independent 
random variables, each drawn from a Borel distribution with 
parameter b, is given by the Borel-Tanner distribution. That 
is, the probability that S m = k (i.e., the sum of m indepen- 
dent Borel variables equals k) is 



P(b, S m = k) 



m {kb) k 



(k- 



where k > m 



(B3) 



(Tanner 1953, 1961). It is easy to see that this expression is 
sensible, since 



k — m+1 



V(j,b) P(b,S m -l=k-j) 



3=1 



k ST^ 1 (jby^e-i" m-1 [(k~j)b] k - m e~t fc - J > 
1^ j\ k-j (k-j-m + l)\ 



(m-1) 



t re — m — kt> 

(k — m)\ 



k — m 

x i ) + (k-l-i) 



i=0 

= P(b, S m = k) 



(B4) 



as it should. The final expression follows from Abel's gener- 
alization of the Binomial theorem (see equations 14 and 20 
in section 1.5 of Riordan 1979 or equation 46 of Sheth 1995). 
Iterating this process shows that 



k — m + 1 



P(b,S m = k) = n(h,b) P(b,S m -i = k-h) 

h=l 
k — m + 1 

= ^i- 6 ) x 

!l=l 

k—h -m + 2 

53 V(h,b) P(b,S m -2 = k-h-l2 



53 V(li,b)r)(l 2 ,b) ■ ■ ■ v{lm,b) 



all m parts 



v(k,by 



(B5) 



all m parts 



• + lr, 



Eh 
j=i J '■ 



where rii + • • • + n re — m, and l\ + ■ 
and the sum in the two final expressions is over all distinct 
ordered partitions of k that have exactly m parts. Notice 
that not all the terms in this sum are different. For exam- 
ple, when m = 3 and k = 6, then the set {123} occurs six 
times, and when m = 3 and k = 7, then the set {223} oc- 
curs thrice. In general, a given set {ni,- ■ • ,nk}, will occur 



V(ni! 



n&!) times. 



Let p(m • ■ • rifcjm, k) denote the probability that a given 
set {ni, ■ ■ ■ , rife} occured, given that there were exactly m 
terms which added up to k. Then p(ni ■ ■ ■ n K |m, k) is given 
by summing up all the terms in equation (B5) correspond- 
ing to it, and normalizing by P(b,S m = k). If it is not 
certain that there were exactly m terms in the partition, 
then we must multiply p(m ■ ■ ■ rik\m, k) by the probability 
that there were exactly m terms that added up to k. Thus, 
p(ni ■ ■ ■ Hk\k), which is conditioned on k only and not on m 
as well, is given by an expression like 



77(1, b)" 1 T ? (2, b) 7 " 2 - - ■ y(k,b) n " 
P(b,S m = k) 



ni ! 



x ra(mjfc), 



where n(m\k) denotes the probability that k is the sum of ex- 
actly m integers. Now, Appendix A showed that the branch- 
ing process we are considering requires n(m\k) to have the 
Binomial distribution of equation (Q). Thus, the probability 
that a given set {m, • • • , nk} occurs is 



p(rii ■ ■ ■ n k \k) = 



»7(1, 61) 



V(k,bi) 



P(b 1 ,S m = k) ml 



fc! (62 

m ! 



k- 1 

m — 1 

-b 1 y n 



61 
62 

e fcfa 

J~k — m 
k 



■ ■ n k \ 
b2 



n k ! 



l[v(j,bi) 



(B6) 



Simple algebra shows that this final expression is equiva- 
lent to equation (^) . This shows explicitly how the partition 
function is related to sums of Borel-distributed random vari- 
ables. 

We are finally in a position to show that equation (Q) 
follows from equation (|B|). The probability that a particle 
which is chosen at random from a clump with exactly k 
particles at the epoch 62 was in a subclump with j particles 
at the epoch i>i is 

53 P(m,--- ,nk\k), 

all part. 

where the sum is over all partitions of k. This sum can be 
written as 



E ivu,bi 



ri(lM) ni • • • VUM)^- 1 • • • ri(k,h) nh 



all part. 



P(b u S m = k) 

(m- 1)! 
ml ■ ■ ■ (rij -!)!••• n k l 



mn(m\k), (B7) 
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which is the same as 

k-j + l 

Ej / . , «, P{bi,S m -i = k- 

k^ M) P(b 1 ,S m = k) 

m — 1 



m n(m, bi\k, 62), (B8) 



where it is understood that when m = 1 then P(bi, S m -i = 
k ~ j) = P{buSo = 0) = 1, and P(h, So = k - j) = for 
j 7^ k. Writing all the terms in the sum explicitly gives 



k-j+i 



y - 

^ k 



j {jhy-^-^ 1 m- 1 [(fc-j> 1 ] fc - J - m+1 e- (fc -^'' 1 



k (k — m)\ 
x — v 



m (kb 1 ) k - m 



k-j+i 

E 

m=2 



(*- 

fc- l\ (bi\ 



j -771+ 1)! 
k — m/ l 

02 



fc - i - 1 

m-2 



fc — j — m-\- 1 



(B9) 



The Binomial theorem reduces this final expression to equa- 
tion (If). This shows explicitly that 



all part. 



■ ,nk\k) 



(BIO) 



as expected. Thus, the merger probabilities of equation (|4|) 
can be derived directly from the partition function (equa- 
tion |H|), which means that the merger probabilities and the 
partition function are mutually consis tent . 

The steps leading to equation (BIO) imply that the 
mean number of subclumps each having exactly j particles 
that are incorporated in a clump with exactly k particles is 



E 

all part. 



Uj p(m, ■ ■ ■ ,n k \k) = - f(j, bi\k, 62). 



J 



(Bll) 



This is the same result as that obtained using a different 
argument (equation 45 in Sheth 1995). However, with the 
partition function, it is now possible to compute the higher 
order moments of this distribution as well. For completeness 
the factorial moments are computed below. These higher 
order moments are useful for estimating the scatter around 
the mean number of j'-clumps per fc-clump. Also, they may 
be good discriminators between different partition functions 
that yield the same f(j\k) merger probabilities. The factorial 
moments are 



= E 77 ^ ,bi 



m— i+1 

X 



(m — i)\ 
k\ {f- 1 ^ 



P(bi,S m =k) 

n(m, 61 j fc, 62) 
b x 



jl 1 (k — k k 
61 



k 1 



(1 -£)+<*-<]**"■ (■»*> 

Since the partition function is known, it is also straightfor- 
ward to compute 'cross-correlation' type moments of the 
form {riiUj), and the associated factorial moments, though 
we have not done so here. 



APPENDIX C: MERGING, BRANCHING, AND 
THE THEORY OF QUEUES 

The Borel distribution (eq. [j]) also arises in studies of the 
distribution of waiting times in queues (Borel 1942; Tan- 
ner 1953; Tanner 1961). The fact that queueing theories 
and the Galton-Watson branching process are closely re- 
lated (Kendall 1951) will be exploited in this Appendix. 

Consider a counter at which customers are served. As- 
sume that customers arrive at the counter in a Poisson pro- 
cess with parameter unity (an average of one arrival per 
unit time) and that the service time is the same constant, 
< b < 1, for each customer. If the counter is busy when a 
customer arrives, the customer joins the back of a queue. So, 
service is on a first-come-first-served basis. In such a system, 
we can ask for the probability that exactly TV customers are 
served before the queue is first emptied, given that there 
was only one customer in the queue initially. This probabil- 
ity is given by the Borel distribution with parameter b (Borel 
1942; Tanner 1953; Consul 1989). The relation of this queue 
to the Galton-Watson branching process described above is 
clear. Simply view the TV customers that were served before 
the queue was first emptied as the descendents, the ones 
who were 'infected' by, the initial customer. 

Now modify the queue system as follows. Assume, as 
before, that customers arrive at a counter in a Poisson pro- 
cess with unit rate, and that the service time is the same 
constant, say 02, for each customer. However, in this case, 
the customers form two queues in accordance with the fol- 
lowing prescription. If a new customer arrives within a time 
bi < 02 of the most recent commencement of service, they 
join the back of a high priority queue, H. If they arrive 
after later than bi of the most recent commencement of ser- 
vice, they join the back of a low priority queue, L. All cus- 
tomers in queue H axe serviced, in the order in which they 
arrived, until the queue is empty. When this happens, the 
first customer in queue L moves into queue H and is serviced 
immediately. Thus, while customers within each queue are 
serviced on a first-come-first-served basis, it is possible for 
some customers in queue H to receive service before oth- 
ers in queue L, even though they may have arrived later. 
In this sense, the modified system does not operate on a 
strictly first-come-first-served basis. 

For such a system, we can ask for the probability that 
exactly k customers are served before both queues are com- 
pletely emptied for the first time, given that there was only 
one customer in queue H and none in queue L initially. 
Clearly, the answer to this question is no different than be- 
fore: this probability must be given by the Borel distribu- 
tion with parameter 62- However, in the time before both 
queues were emptied for the first time, having serviced ex- 
actly k customers, queue H may have been emptied a num- 
ber of times (though certainly not more than k times) . Sup- 
pose that it was emptied m times. Define a batch of cus- 
tomers as the number, I, of customers served between two 
successive empty periods of queue H. Then we can ask for 
the probability, p(h, h, ■ ■ • , l m \k), that queue H was emp- 
tied m times, and that customers were served in batches 
of h,l2, - • • ,lm (not necessarily in that order), given that 
h + 12 + • • • + lm — k, and that there was only one customer 
in queue H initially. Then p(h, I2, • • • , lm\k) is the same as 
that defined in the Galton-Watson process considered ear- 
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lier in thispaper. So, for this queue system, it is given by 
equation fl). 

In the context of this queue system, equation([|) , rj(l, b) 
is the Borel distribution with parameter b (eq. |l|), and rij 
denotes the number of times exactly j customers passed 
through queue H before it was emptied, given that k passed 
through it before both H and L were emptied. The first 
term on the right of equation (^[) accounts for the fact that 
the probability of serving exactly U customers between two 
successive empty periods in the H queue is a Borel distri- 
bution, and so it weights the occurence of each U in the 
sequence of service batches with the probability, r/(li,bi), 
that li occured. The second term on the right accounts for 
the different permutations of the different service sequences, 
since the various permutations of {h, ■ • ■ , l m } all contribute 
to the same p(h,l2, ■ ■ ■ ,l m \k). The final term looks like a 
Poisson weighting term, and is obtained by an argument 
that is similar to that used by Tanner (1961) in his elegant 
derivation of the Borel distribution. 

Consider the case in which both queues are empty, hav- 
ing serviced k customers since the last time they were both 
empty. If the H queue was emptied m times during this busy 
period, then to — 1 customers must have passed through 
queue L during this time. So, we can ask for the probability 
that m — 1 customers arrived at and passed through queue 
L during this time. However, in Tanner's (1961) language, 
not all possible arrival patterns are 'admissible', since the 
arrivals in queue L must be consistent with the pattern of 
service times in queue H . Namely, during each of the first 
m — 1 instants at which queue H is emptied, there must be 
at least one customer in queue L, but the mth time that H 
is emptied, queue L must also be empty. The problem is to 
list all possible sequences of arrivals at queue L, and then 
to determine the fraction of these that are admissible. 

Recall that customers arrive in a Poisson process with 
unit rate at a queueing system that has constant service 
time &2 < 1 per customer, and the first customer joins queue 
H without passing through queue L. If a customer arrives 
during the first fraction, 61/&2, of the service time, then they 
join queue H, otherwise they join queue L. If k customers 
were served in total, then there were k opportunities for 
customers to join queue L. Each of these k opportunities was 
of duration 62 — &i • Since the arrival of customers is random, 
the probability that m— 1 customers passed through queue L 
during the time in which k customers were served is given by 



the Poisson distribution, [fc(&2 — bi)] 



m — l —k(b2—bi) 



/(m-1)!. 



However, the probability that the m — 1 arrivals were in 
an admissible order introduces an additional factor of 1/m. 
This sets the final term in equation (^). 

In terms of the Galton- Watson branching process con- 
sidered in the main text (in which the probability that any 
parent has ni daughters is a Poisson distribution with pa- 
rameter b\ , and the probability that that parent also had 712 
sons is a Poisson distribution with parameter 62 — &i ) , equa- 
tion (^) lists the probability that a family of size k is made 
of the m subfamilies (each with a male at the head and only 
females in the subsequent generations), having h, I2, ■ • ■ , l m 
members in each. So, for the gravitational clustering process, 
the probability that a clump of size k at the epoch 62 had 
the m progenitors h,h, • • • ,lm at the epoch b\ is given by 
equation (^). As noted in section 3, equation (|B|) can be used 
to generate the partition function of merger history trees. 



This way of deriving the partition function, by gener- 
alizing Tanner's (1961) argument, has another connection 
to previous work. In effect, it is an alternative derivation of 
the excursion set scaling solution derived in section 3.2 of 
Sheth (1995). This follows because Tanner showed how his 
queueing system could be formulated in terms of an excur- 
sion set process. Here we have described a queueing system 
that is associated with the Poisson Galton- Watson branch- 
ing process. Comparison of this queue with the excursion set 
process considered in section 3.2 of Sheth (1995) shows that 
they are equivalent. 



APPENDIX D: COUNTS-IN-CELLS FROM THE 
GALTON-WATSON BRANCHING PROCESS 

The branching process extension of the Press-Schechter ap- 
proach is very powerful. In the main text it was used to 
provide a description of the merger history tree. However, 
following a calculation suggested by Consul (1989), it is also 
relatively straightforward to use the Galton- Watson branch- 
ing process to provide a derivation of the distribution of 
counts in randomly placed cells, at any epoch characterized 
by b. Assume that Press-Schechter Borel clumps collapse 
completely to points, so that if the cluster center is included 
in a cell, all associated particles are also. Now assume that 
all clumps evolve in accordance with the Press-Schechter 
description developed in the main text. 

The analogy with the Galton- Watson process is as fol- 
lows. The number of particles in a given cell is the same as 
the total number of progeny of the Galton-Watson process. 
However, unlike the Press-Schechter mass function consid- 
ered in the main text, in this case the number of initial 
ancestors, Xo, is not unity. Rather, it is the number of clus- 
ter centers, Xo = m, say, that happen to be in the cell. 
So, for a cell containing m cluster centers, rather than con- 
sidering the total number of progeny of one ancestor (the 
Borel distribution with parameter b), we need to calculate 
the total number of progeny given that there are Xo = m 
ancestors in the zeroth generation. This is the same as the 
Poisson Galton-Watson process, conditioned on the num- 
ber of ancestors being Xo — m, rather than unity. Both 
the branching process (Consul 1989) and the queueing the- 
ory (Tanner 1953) formulations of this problem show that 
the probability that there are exactly N particles in the cell 
given that there are exactly m cluster centers in the cell is 



, m (NbY- ""e 

p(iV|m) = n (N- m y. 



(Dl) 



which is the Borel-Tanner distribution of Appendix C. 
When to = 1 it reduces to the Borel distribution of equa- 
tion 

The final key idea is to assume that, since the initial dis- 
tribution is Poisson, any randomly placed cell will include a 
random number Xo of cluster centers. That is, the distribu- 
tion of Xo will be Poisson, with a parameter that is specified 
by the epoch, labelled by b, at which the cell is placed, and 
by the size of the cell. This model for the number of cluster 
centers in a randomly placed cell, given that the initial dis- 
tribution was Poisson, is consistent with linear theory (e.g. 
Peebles 1980), and is motivated by a scaling argument that 
can be applied to the Poisson Press-Schechter description 
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(section 3.2 in Sheth 1995). However, the scaling argument 
is equivalent to the queueing theory interpretation of the 
Galton-Watson partition function of merger history trees 
(see discussion at the end of Appendix C). Thus, the as- 
sumption that the number of clusters in a randomly placed 
cell is a Poisson random variable is consistent with the par- 
tition structure derived earlier in this paper. 

When the distribution of Xq is Poisson with parame- 
ter JVci, then the probability that any randomly placed cell 
contains exactly N particles is 



J2p(m)P(N\m) = J2 — 

m=0 m—Q 

TV! 



■P(N\m) 



(N cl + Nb) 



N-l -N-i-Nb 
e cl 



(D2) 



Setting N c \ — nV(l — b), where n is the average density of 
particles and V is the size of the cell, is required by nor- 
malization. This shows that equation (DS) implies that the 



probability that a randomly placed cell of size V contains 
exactly N particles is 



f(N, V) = 



N(l - b) 



N(l-b)+Nb 



-N(l-b)-Nb 



(D3) 



where the left hand side now shows the volume dependence 
explicitly. 



Equation (D3) is also a solution to the Saslaw & Hamil- 



ton (1984) thermodynamic model of nonlinear gravitational 
clustering. It can be understood as describing a Poisson 
distribution of point sized clumps, where the probability 
a clump has N associated particles is given by a Borel 
distribution with parameter b (Saslaw 1989). Thus, equa- 
tions ([j]) and (D5) can be derived from a thermodynamic 
model (Saslaw & Hamilton 1984; Sheth 1995), from an anal- 
ysis of the excursion set statistics of overdense regions of 
a Poisson distribution (Sheth 1995), and from the Poisson 
Galton-Watson branching process (Consul 1989). 

In equation (DS), b is constant, independent of cell 
size V. This is a consequence of assuming that all clumps 
are point sized. Relaxing this assumption means that the 
branching process is no longer straightforward to implement. 
Nevertheless, we can use the Poisson cluster interpretation 
of this branching process derivation of f(N, V) to gain some 
understanding of the shape of the counts in cells distribution 
when the point sized approximation is relaxed. The counts 
in cells distribution of a Poisson distribution of Borel (with 
parameter b) Press-Schechter clumps that have nontrivial 
sizes and shapes is well approximated by equation (D3), ex- 
cept that b becomes scale dependent. It tends to zero as 
V — > and it tends to a constant value as V becomes larger 
than the typical clump size (Sheth & Saslaw 1994). 

Whereas the usual Press-Schechter analysis of excur- 
sion set mass functions provides information about the dis- 
tribution of virialized clump masses, it does not provide 
information about the internal structure of these clumps, 
nor does it describe how these clumps are distributed rela- 
tive to each other in space. The clumps may be correlated 
with each other, or distributed uniformly at random. Thus, 
one cannot compute the TV-point correlation functions of 



the clustered distribution, nor can one construct the nonlin- 
ear counts in cells distribution function. In this respect, the 
Press-Schechter description does not provide a complete de- 
scription of nonlinear clustering. Therefore, it is very inter- 
esting that numerical simulations of gravitational clustering 
from an init ially Poisson distribution confirm the accuracy 
of equation (D3) on all scales, as well as the scale dependence 
and temporal evolution of b (Sheth & Saslaw 1994 and ref- 
erences therein). This measured accuracy of equation (D3), 



and the way in which it can be derived from the branching 
process, suggests one way in which the Press-Schechter ap- 
proach may be extended to provide some information about 
the counts in cells distribution. 

Before concluding, we note that the partition structure 
derived in the main text is independent of the accuracy or 
applicability of the results of this Appendix. That is, the 
results of the main text are independent of whether or not 
the Borel clusters have a Poisson spatial distribution. 



